week 4: overfitting/mcmc

mcmc – brms

At this point in the term, we’ll be deviating in our code from McElreath. His course is taught entirely using rethinking, which is a pedogigical tool. It has clear mapping between mathematical models and syntax. But it lacks flexibility and has fewer modeling options.

On the other hand, there is a package called brms that also does Bayesian modeling. This package uses syntax simliar to lme4 (if you’ve used that), supports a wider range of distributions, integrates with the tidyverse ecosystem (if you’ve used that), has more extensive documentation, is more actively maintained, is more widely used (i.e., more support), and is more suitable for complex models.

You’re welcome to use the rethinking package when it suits you, in this course and in your research, but my goal is to introduce you to the brms package. Instead of reviewing the code from McElreath’s lecture today, we’ll be revisiting some familiar models using brms.

model specification

Let’s return to the height and weight data.

data(Howell1, package = "rethinking")
d <- Howell1
library(measurements)
d$height <- conv_unit(d$height, from = "cm", to = "feet")
d$weight <- conv_unit(d$weight, from = "kg", to = "lbs")
describe(d, fast = T)
       vars   n  mean    sd median  min    max range  skew kurtosis   se
height    1 544  4.54  0.91   4.88 1.77   5.88   4.1 -1.26     0.58 0.04
weight    2 544 78.51 32.45  88.31 9.37 138.87 129.5 -0.54    -0.94 1.39
age       3 544 29.34 20.75  27.00 0.00  88.00  88.0  0.49    -0.56 0.89
male      4 544  0.47  0.50   0.00 0.00   1.00   1.0  0.11    -1.99 0.02
d <- d[d$age >= 18, ]
d$height_c <- d$height - mean(d$height)

\[\begin{align*} w_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta (h_i - \bar{h}) \\ \alpha &\sim \text{Normal}(130, 20) \\ \beta &\sim \text{Normal}(0, 25) \\ \sigma &\sim \text{Uniform}(0, 25) \\ \end{align*}\]

m42.1 <-brm(
  data = d, 
  family = gaussian,
  weight ~ 1 + height_c,
  prior = c( prior( normal(130,20), class = Intercept),
             prior( normal(0,25), class = b),
             prior( uniform(0,50), class = sigma, ub = 50)
    ), 
  iter = 5000, warmup = 1000, chains = 4, 
  seed = 3, 
      file = here("fits/m42.1"))

brm() is the core function for fitting Bayesian models using brms.

m42.1 <-brm(
  data = d, 
  family = gaussian,
  weight ~ 1 + height_c,
  prior = c( prior( normal(130,20), class = Intercept),
             prior( normal(0,25), class = b),
             prior( uniform(0,50), class = sigma, ub = 50)
    ), 
  iter = 5000, warmup = 1000, chains = 4,
  seed = 3, 
      file = here("fits/m42.1"))

family specifies the distribution of the outcome family. In many examples, we’ll use a gaussian (normal) distribution. But there are many many many options for this.

m42.1 <-brm(
  data = d, 
  family = gaussian,
  weight ~ 1 + height_c,
  prior = c( prior( normal(130,20), class = Intercept),
             prior( normal(0,25), class = b),
             prior( uniform(0,50), class = sigma, ub = 50)
    ), 
  iter = 5000, warmup = 1000, chains = 4,
  seed = 3, 
      file = here("fits/m42.1"))

The formula argument is what you would expect from the lm() and lmer() functions you have seen in the past. The benefit of brms is that this formula can easily handle complex and non-linear terms. We’ll be playing with more in future classes.

m42.1 <-brm(
  data = d, 
  family = gaussian,
  weight ~ 1 + height_c,
  prior = c( prior( normal(130,20), class = Intercept),
             prior( normal(0,25), class = b),
             prior( uniform(0,50), class = sigma, ub = 50)
    ), 
  iter = 5000, warmup = 1000, chains = 4,
  seed = 3, 
      file = here("fits/m42.1"))

Here we set our priors. Class b refers to population-level slope parameters (sometimes called fixed effects). Again, this argument has the ability to become very detailed, specific, and flexible, and we’ll play more with this.

m42.1 <-brm(
  data = d, 
  family = gaussian,
  weight ~ 1 + height_c,
  prior = c( prior( normal(130,20), class = Intercept),
             prior( normal(0,25), class = b),
             prior( uniform(0,50), class = sigma, ub = 50)
    ), 
  iter = 5000, warmup = 1000, chains = 4,
  seed = 3, 
      file = here("fits/m42.1"))

Hamiltonian MCMC runs for a set number of iterations, throws away the first bit (the warmup), and does that up multiple times (the number of chains).

m42.1 <-brm(
  data = d, 
  family = gaussian,
  weight ~ 1 + height_c,
  prior = c( prior( normal(130,20), class = Intercept),
             prior( normal(0,25), class = b),
             prior( uniform(0,50), class = sigma, ub = 50)
    ), 
  iter = 5000, warmup = 1000, chains = 4,
  seed = 3, 
      file = here("fits/m42.1"))

Remember, these are random walks through parameter space, so set a seed for reproducbility. Also, these can take a while to run, especially when you are developing more complex models. If you specify a file, the output of the model will automatically be saved. Even better, then next time you run this code, R will check for that file and load it into your workspace instead of re-running the model. (Just be sure to delete the model file if you make changes to any other part of the code.)

summary(m42.1)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: weight ~ 1 + height_c 
   Data: d (Number of observations: 352) 
  Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
         total post-warmup draws = 16000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    99.21      0.50    98.23   100.18 1.00    14807    12164
height_c     42.05      1.95    38.22    45.91 1.00    15921    12552

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     9.38      0.36     8.72    10.12 1.00    15626    12366

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
plot(m42.1)

checking your model

Before we start to interpret our model, we should evaluate whether it does a good job. Posterior predictive checks plot the implied distribution of your outcome next to your actual distribution. If your posterior predictive values are off, your model is off.

pp_check(m42.1)

checking your model

We can also look at the posterior distributions of our chains. Remember, they should be covering most of the same space, so these distributions should pretty much overlap.

mcmc_plot(m42.1, type = "dens_overlay")

Let’s sample from the posterior. First, get_variables() will tell us everything at our disposal.

get_variables(m42.1)
 [1] "b_Intercept"   "b_height_c"    "sigma"         "Intercept"    
 [5] "lprior"        "lp__"          "accept_stat__" "stepsize__"   
 [9] "treedepth__"   "n_leapfrog__"  "divergent__"   "energy__"     

Let’s focus on just the parameters we’ve estimated. In prior lectures, we’ve drawn samples from the posterior distribution to generate plots and provide summaries. We can use the spread_draws() function to do so.

p42.1 <- m42.1 %>% 
  spread_draws(b_Intercept, b_height_c, sigma, 
               ndraws = 1e4, seed = 123)
dim(p42.1)
[1] 10000     6
head(p42.1)
# A tibble: 6 × 6
  .chain .iteration .draw b_Intercept b_height_c sigma
   <int>      <int> <int>       <dbl>      <dbl> <dbl>
1      1       2463  2463        99.6       41.6  9.59
2      1       2511  2511        99.7       41.4  9.32
3      3       2419 10419        99.0       42.8  8.85
4      3        718  8718        99.4       39.9  8.74
5      4        483 12483        99.6       41.9  9.17
6      1       2986  2986       100.        43.0  9.46
Code
p42.1 %>% 
  ggplot(aes(x = b_Intercept)) +
  geom_density(fill = "#1c5253", color = "white") +
  labs(
    title = "Posterior probability",
    x = "probabilty of intercept (mean weight)"
  ) + 
  scale_y_continuous(NULL, breaks = NULL)

Finally, we might want to plot the bivariate distributions of our parameters.

pairs(m42.1)

If we were encountering this problem for the first time, we would want to work on on our priors. These ones are pretty bad. We have a few tools available to help us define and test our priors.

First, let’s view the available priors for our model:

get_prior(
  formula = weight ~ 1 + height_c,
  data = d
)
                    prior     class     coef group resp dpar nlpar lb ub
                   (flat)         b                                     
                   (flat)         b height_c                            
 student_t(3, 98.7, 14.8) Intercept                                     
    student_t(3, 0, 14.8)     sigma                                 0   
       source
      default
 (vectorized)
      default
      default

If you’re ever not sure what coefficients to put priors on, this function can help with that.

Let’s refit our model with our earlier priors. Before we fit this to data, we’ll start by only sampling from our priors.

m42.1p <- brm(
  data = d, 
  family = gaussian,
  weight ~ 1 + height_c,
  prior = c( prior( normal(130,20), class = Intercept),
             prior( normal(0,25), class = b),
             prior( uniform(0,50), class = sigma, ub = 50)
    ), 
  iter = 5000, warmup = 1000,
  seed = 3, 
  sample_prior = "only")

The output of spread_draws will now draw from samples from the prior, not samples from the posterior.

p42.1p <- m42.1p %>% 
  spread_draws(b_Intercept, b_height_c, sigma)
head(p42.1p)
# A tibble: 6 × 6
  .chain .iteration .draw b_Intercept b_height_c sigma
   <int>      <int> <int>       <dbl>      <dbl> <dbl>
1      1          1     1        144.       17.0  31.2
2      1          2     2        116.      -22.1  18.4
3      1          3     3        139.      -37.5  32.0
4      1          4     4        158.      -23.6  10.9
5      1          5     5        102.       20.3  35.9
6      1          6     6        139.       13.7  32.7

We’ll plot the regression lines from the priors against the real data, to see if they make sense.

Code
labels = seq(4, 6, by = .5)
breaks = labels - mean(d$height)
d %>% 
  ggplot(aes(x = height_c, y = weight)) + 
  geom_blank()+
  geom_abline(aes( intercept=b_Intercept, slope=b_height_c), 
              data = p42.1p[1:50, ], #first 50 draws only
              color = "#1c5253",
              alpha = .3) +
  scale_x_continuous("height(feet)", breaks = breaks, labels = labels) +
  scale_y_continuous("weight(lbs)", limits = c(50,150))

Let’s see if we can improve upon this model. One thing we know for sure is that the relationship between height and weight is positive. We may not know the exact magnitude, but we can use a distribution that doesn’t go below zero. We’ve already discussed uniform distributions, but those are pretty uninformative – they won’t do a good job regularizing – and we can also run into trouble if our bounds are not inclusive enough.

The log-normal distribution would be a good option here.

Code
set.seed(4)

tibble(b = rlnorm(1e4, mean = 0, sd = 1)) %>% 
  ggplot(aes(x = b)) +
  geom_density(fill = "grey92") +
  coord_cartesian(xlim = c(0, 5)) +
  labs(title = "Log-Normal(0,1)")

The log-normal is the distribution whose logarithm is normally distributed.

Code
set.seed(4)

tibble(rnorm           = rnorm(1e5, mean = 0, sd = 1),
       `log(rlognorm)` = log(rlnorm(1e5, mean = 0, sd = 1))) %>% 
  pivot_longer(everything()) %>% 

  ggplot(aes(x = value)) +
  geom_density(fill = "grey92") +
  coord_cartesian(xlim = c(-3, 3)) +
  facet_wrap(~ name, nrow = 2)

Let’s try this new prior. Play around with the plot code to find parameters that you think are reasonable. I’m going to use 1,2.

m42.2p <- brm(
  data = d, 
  family = gaussian,
  weight ~ height_c,
  prior = c( prior( normal(130,20), class = Intercept),
             prior( lognormal(1,2), class = b),
             prior( uniform(0,50), class = sigma, ub = 50)
    ), 
  iter = 5000, warmup = 1000,
  seed = 3, 
  sample_prior = "only")
Code
p42.2p <- m42.2p %>% 
  spread_draws(b_Intercept, b_height_c, sigma)
d %>% 
  ggplot(aes(x = height_c, y = weight)) + 
   geom_blank()+
  geom_abline(aes( intercept=b_Intercept, slope=b_height_c), 
              data = p42.2p[1:50, ], #first 50 draws only
              color = "#1c5253",
              alpha = .3) +
  scale_x_continuous("height(feet)", breaks = breaks, labels = labels) +
  scale_y_continuous("weight(lbs)", limits = c(50,150))

Applied to our dataset:

Code
m42.2 <- brm(
  data = d, 
  family = gaussian,
  weight ~ height_c,
  prior = c( prior( normal(130,20), class = Intercept),
             prior( lognormal(1,2), class = b),
             prior( uniform(0,50), class = sigma, ub = 50)
    ), 
  iter = 5000, warmup = 1000,
  seed = 3,
  file = here("fits/m42.2"))
summary(m42.2)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: weight ~ height_c 
   Data: d (Number of observations: 352) 
  Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
         total post-warmup draws = 16000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    99.20      0.50    98.22   100.18 1.00    13657    11564
height_c     42.16      2.01    38.26    46.07 1.00    17384    12434

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     9.39      0.36     8.72    10.13 1.00    15405    11446

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s return to the tidybayes functions for summaries. As a reminder, we already saw spread_draws()

post_draws = m42.2 %>% 
  spread_draws(b_Intercept, b_height_c, sigma) %>% 
  sample_n(50) 

m_height <- mean(d$height)

d %>% 
  ggplot(aes(x = height, y = weight)) +
  geom_point(alpha = .5) + 
  geom_abline(aes(intercept = b_Intercept - b_height_c*m_height, #to account for centering
                  slope = b_height_c),
              alpha = .3, 
              color = "#1c5253",
              data = post_draws)

exercise (setup)

Let’s practice what we’ve learned using a different dataset. We’ll use the built-in msleep dataset from the ggplot2 package, which contains sleep data for various mammals.

data("msleep")
d_sleep <- msleep %>%
  drop_na(sleep_total, bodywt) %>%
  mutate(log_weight = log(bodywt))

# Quick look at the data
describe(d_sleep[c("sleep_total", "bodywt", "log_weight")], fast = T)
            vars  n   mean     sd median  min    max  range skew kurtosis    se
sleep_total    1 83  10.43   4.45  10.10  1.9   19.9   18.0 0.05    -0.71  0.49
bodywt         2 83 166.14 786.84   1.67  0.0 6654.0 6654.0 7.10    53.72 86.37
log_weight     3 83   0.84   3.26   0.51 -5.3    8.8   14.1 0.30    -0.76  0.36

Our goal is to model the relationship between body weight and total sleep duration. Because body weight is highly skewed, we’ll use the log-transformed weight.

exercise

Let’s set up a model to predict sleep duration from log body weight:

\[\begin{align*} \text{sleep}_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta (\text{log_weight}_i) \\ \alpha &\sim \text{Normal}(12, 4) \\ \beta &\sim \text{Normal}(0, 2) \\ \sigma &\sim \text{Uniform}(0, 10) \\ \end{align*}\]

  1. Create a model using brm() that samples only from the priors (sample_prior = "only")
  2. Plot some regression lines from your prior to see if they make sense
  3. Adjust the priors if needed

solution

m_sleep_prior <- brm(
  data = d_sleep,
  family = gaussian,
  sleep_total ~ log_weight,
  prior = c(
    prior(normal(12, 4), class = Intercept),
    prior(normal(0, 2), class = b),
    prior(uniform(0, 10), class = sigma)
  ),
  sample_prior = "only",
  seed = 123
)
Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 16.0.0 (clang-1600.0.26.6)’
using SDK: ‘MacOSX15.2.sdk’
clang -arch arm64 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/arm64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/Eigen/Core:19:
/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
  679 | #include <cmath>
      |          ^~~~~~~
1 error generated.
make: *** [foo.o] Error 1

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 2e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.2 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.01 seconds (Warm-up)
Chain 1:                0.011 seconds (Sampling)
Chain 1:                0.021 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 0 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.013 seconds (Warm-up)
Chain 2:                0.006 seconds (Sampling)
Chain 2:                0.019 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 1e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.011 seconds (Warm-up)
Chain 3:                0.022 seconds (Sampling)
Chain 3:                0.033 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 0 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.011 seconds (Warm-up)
Chain 4:                0.009 seconds (Sampling)
Chain 4:                0.02 seconds (Total)
Chain 4: 

solution

Code
p_sleep_p <- m_sleep_prior %>% 
  spread_draws(b_Intercept, b_log_weight, sigma)

d_sleep %>% 
  ggplot(aes(x = log_weight, y = sleep_total)) + 
   geom_blank()+
  geom_abline(aes( intercept=b_Intercept, slope=b_log_weight), 
              data = p_sleep_p[1:50, ], #first 50 draws only
              color = "#1c5253",
              alpha = .3) +
  labs(
    x = "weight (log)",
    y = "sleep"
  )

exercise

Once you’re happy with your priors, fit the actual model to the data. Then:

  1. Check the model summary
  2. Create a posterior predictive check plot
  3. Plot the actual data with regression lines from your posterior draws

solution

m_sleep_fit <- brm(
  data = d_sleep,
  family = gaussian,
  sleep_total ~ log_weight,
  prior = c(
    prior(normal(12, 4), class = Intercept),
    prior(normal(0, 2), class = b),
    prior(uniform(0, 10), class = sigma)
  ),
  seed = 123,
  file = here("fits/m2.sleep")
)
summary(m_sleep_fit)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: sleep_total ~ log_weight 
   Data: d_sleep (Number of observations: 83) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     11.10      0.42    10.28    11.92 1.00     4051     2987
log_weight    -0.78      0.13    -1.02    -0.53 1.00     3865     2709

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     3.74      0.30     3.23     4.37 1.00     3676     2744

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
pp_check(m_sleep_fit)

solution

Code
p_sleep <- m_sleep_fit %>% 
  spread_draws(b_Intercept, b_log_weight, sigma)

d_sleep %>% 
  ggplot(aes(x = log_weight, y = sleep_total)) + 
  geom_point()+
  geom_abline(aes( intercept=b_Intercept, slope=b_log_weight), 
              data = p_sleep_p[1:50, ], #first 50 draws only
              color = "#1c5253",
              alpha = .3) +
  labs(
    x = "weight (log)",
    y = "sleep"
  )

We also have gather_draws():

m42.2 %>% 
  gather_draws(b_Intercept, b_height_c, sigma)   %>% 
  sample_n(2)
# A tibble: 6 × 5
# Groups:   .variable [3]
  .chain .iteration .draw .variable   .value
   <int>      <int> <int> <chr>        <dbl>
1      2       2462  6462 b_Intercept  99.4 
2      4       2592 14592 b_Intercept  99.2 
3      1       3817  3817 b_height_c   42.4 
4      3       3483 11483 b_height_c   45.2 
5      1       1495  1495 sigma         8.94
6      4       1851 13851 sigma         9.49

How is this different from spread_draws()?

gather_draws() is a useful function if we’re thinking about summarizing the results of our models.

m42.2 %>% 
  gather_draws(b_Intercept, b_height_c, sigma) %>% 
  median_qi()
# A tibble: 3 × 7
  .variable   .value .lower .upper .width .point .interval
  <chr>        <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1 b_height_c   42.1   38.3    46.1   0.95 median qi       
2 b_Intercept  99.2   98.2   100.    0.95 median qi       
3 sigma         9.38   8.72   10.1   0.95 median qi       
m42.2 %>% 
  gather_draws(b_Intercept, b_height_c, sigma) %>% 
  ggplot(aes(x = .value, y=.variable)) +
  stat_halfeye()

model-based predictions

We can make two kinds of predictions based on our model. First, we can get a posterior predictive distribution using add_predicted_draws():

# simulate new data
height_c = sample(d$height_c, size = 1e2, replace = T)
# get predictions
predictions = data.frame(height_c) %>% add_predicted_draws(m42.2, seed = 1)
dim(predictions)
[1] 1600000       6
head(predictions)
# A tibble: 6 × 6
# Groups:   height_c, .row [1]
  height_c  .row .chain .iteration .draw .prediction
     <dbl> <int>  <int>      <int> <int>       <dbl>
1   -0.301     1     NA         NA     1        81.5
2   -0.301     1     NA         NA     2        88.5
3   -0.301     1     NA         NA     3        78.7
4   -0.301     1     NA         NA     4       103. 
5   -0.301     1     NA         NA     5        89.1
6   -0.301     1     NA         NA     6        78.2

Or, we can get expected values using add_epred_draws():

# get expected values
expected_vals = data.frame(height_c) %>% add_epred_draws(m42.2, seed = 1)
dim(expected_vals)
[1] 1600000       6
head(expected_vals)
# A tibble: 6 × 6
# Groups:   height_c, .row [1]
  height_c  .row .chain .iteration .draw .epred
     <dbl> <int>  <int>      <int> <int>  <dbl>
1   -0.301     1     NA         NA     1   87.4
2   -0.301     1     NA         NA     2   86.8
3   -0.301     1     NA         NA     3   86.4
4   -0.301     1     NA         NA     4   87.4
5   -0.301     1     NA         NA     5   85.9
6   -0.301     1     NA         NA     6   85.7
predictions %>% full_join(expected_vals) %>% 
  pivot_longer(c(.prediction, .epred)) %>% 
  ggplot(aes(x=value, group = name)) +
  geom_density(aes(fill=name), alpha=.5)
predictions %>% full_join(expected_vals) %>% 
  pivot_longer(c(.prediction, .epred)) %>% 
  sample_n(size = 200) %>% 
  mutate(height = height_c + m_height) %>% 
  ggplot(aes(x=height, y=value, group = name)) +
    geom_point(alpha = .3) +
  facet_wrap(~name)

exercise

Using the tools we learned (spread_draws(), gather_draws(), etc.):

  1. What is the estimated effect of body weight on sleep duration?
  2. Generate predictions for a few new body weights
  3. Create a plot showing the uncertainty in your predictions

solution

p_sleep %>% 
  gather_draws(b_Intercept, b_log_weight, sigma) %>% 
  median_qi
# A tibble: 3 × 7
  .variable    .value .lower .upper .width .point .interval
  <chr>         <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1 b_Intercept  11.1    10.3  11.9     0.95 median qi       
2 b_log_weight -0.776  -1.02 -0.527   0.95 median qi       
3 sigma         3.71    3.23  4.37    0.95 median qi       

solution

log_weight = sample(d_sleep$log_weight, replace = T, size = 10)
predictions = data.frame(log_weight) %>% add_predicted_draws(m_sleep_fit, seed = 1)
head(predictions)
# A tibble: 6 × 6
# Groups:   log_weight, .row [1]
  log_weight  .row .chain .iteration .draw .prediction
       <dbl> <int>  <int>      <int> <int>       <dbl>
1       1.92     1     NA         NA     1        7.46
2       1.92     1     NA         NA     2       10.4 
3       1.92     1     NA         NA     3        6.34
4       1.92     1     NA         NA     4       14.6 
5       1.92     1     NA         NA     5       11.4 
6       1.92     1     NA         NA     6        6.43
predictions %>% 
  ggplot(aes(x = .prediction)) +
  geom_density(aes(x = sleep_total), data = msleep) +
  geom_histogram(aes(y = ..density..), fill = "#1c5253", color = "white", alpha = .3)

model fit and comparisons

If you want to get the pareto-smoothed importance sampling:

loo1 <- loo(m42.2, save_psis = T)
loo1

Computed from 16000 by 352 log-likelihood matrix.

         Estimate   SE
elpd_loo  -1288.5 14.0
p_loo         3.2  0.4
looic      2577.1 27.9
------
MCSE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.8, 1.1]).

All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.

And for the widely applicable information criteria:

waic(m42.2)

Computed from 16000 by 352 log-likelihood matrix.

          Estimate   SE
elpd_waic  -1288.5 14.0
p_waic         3.2  0.4
waic        2577.1 27.9

Remember, these are primarily used to compare multiple models. See the loo package for more functions to help you compare models and identify influential data points.

bonus

Try adding a new predictor to your model (e.g., vore - what type of food the animal eats). How does this change your predictions?